Stochastic Variance Reduction Methods for Policy Evaluation

نویسندگان

  • Simon S. Du
  • Jianshu Chen
  • Lihong Li
  • Lin Xiao
  • Dengyong Zhou
چکیده

Policy evaluation is a crucial step in many reinforcement-learning procedures, which estimates a value function that predicts states’ longterm value under a given policy. In this paper, we focus on policy evaluation with linear function approximation over a fixed dataset. We first transform the empirical policy evaluation problem into a (quadratic) convex-concave saddle point problem, and then present a primal-dual batch gradient method, as well as two stochastic variance reduction methods for solving the problem. These algorithms scale linearly in both sample size and feature dimension. Moreover, they achieve linear convergence even when the saddle-point problem has only strong concavity in the dual variables but no strong convexity in the primal variables. Numerical experiments on benchmark problems demonstrate the effectiveness of our methods.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Stochastic Variance Reduction for Policy Gradient Estimation

Recent advances in policy gradient methods and deep learning have demonstrated their applicability for complex reinforcement learning problems. However, the variance of the performance gradient estimates obtained from the simulation is often excessive, leading to poor sample efficiency. In this paper, we apply the stochastic variance reduced gradient descent (SVRG) technique [1] to model-free p...

متن کامل

Performance Evaluation and Policy Selection in Multiclass Networks

This paper concerns modelling and policy synthesis for regulation of multiclass queueing networks. A 2-parameter network model is introduced to allow independent modelling of variability and mean processing-rates, while maintaining simplicity of the model. Policy synthesis is based on consideration of more tractable workload models, and then translating a policy from this abstraction to the dis...

متن کامل

Variance Reduction for Policy Gradient with Action-Dependent Factorized Baselines

Policy gradient methods have enjoyed great success in deep reinforcement learning but suffer from high variance of gradient estimates. The high variance problem is particularly exasperated in problems with long horizons or high-dimensional action spaces. To mitigate this issue, we derive a bias-free action-dependent baseline for variance reduction which fully exploits the structural form of the...

متن کامل

Deep Reinforcement Learning

In reinforcement learning (RL), stochastic environments can make learning a policy difficult due to high degrees of variance. As such, variance reduction methods have been investigated in other works, such as advantage estimation and controlvariates estimation. Here, we propose to learn a separate reward estimator to train the value function, to help reduce variance caused by a noisy reward sig...

متن کامل

Integrated Variance Reduction Strategies for Simulation

We develop strategies for integrated use of certain well-known variance reduction techniques to estimate a mean response in a finite-horizon simulation experiment. The building blocks for these integrated variance reduction strategies are the techniques of conditional expectation, correlation induction (including antithetic variates and Latin hypercube sampling), and control variates; and all p...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017